195 research outputs found
The Mirror DBMS at TREC-8
The database group at University of Twente participates in TREC8 using the Mirror DBMS, a prototype database system especially designed for multimedia and web retrieval. From a database perspective, the purpose has been to check whether we can get sufficient performance, and to prepare for the very large corpus track in which we plan to participate next year. From an IR perspective, the experiments have been designed to learn more about the effect of the global statistics on the ranking
The SIKS/BiGGrid Big Data Tutorial
The School for Information and Knowledge Systems SIKS and the Dutch e-science grid BiG Grid organized a new two-day tutorial on Big Data at the University of Twente on 30 November and 1 December 2011, just preceding the Dutch-Belgian Database Day. The tutorial is on top of some exciting new developments in large-scale data processing and data centers, initiated by Google, and followed by many others such as Yahoo, Amazon, Microsoft, and Facebook. The course teaches how to process terabytes of data on large clusters, and discusses several core computer science topics adapted for big data, such as new file systems (Google File System and Hadoop FS), new programming paradigms (MapReduce), new programming languages and query languages (Sawzall, Pig Latin), and new 'noSQL' databases (BigTable, Cassandra and Dynamo)
Runtime Optimizations for Prediction with Tree-Based Models
Tree-based models have proven to be an effective solution for web ranking as
well as other problems in diverse domains. This paper focuses on optimizing the
runtime performance of applying such models to make predictions, given an
already-trained model. Although exceedingly simple conceptually, most
implementations of tree-based models do not efficiently utilize modern
superscalar processor architectures. By laying out data structures in memory in
a more cache-conscious fashion, removing branches from the execution flow using
a technique called predication, and micro-batching predictions using a
technique called vectorization, we are able to better exploit modern processor
architectures and significantly improve the speed of tree-based models over
hard-coded if-else blocks. Our work contributes to the exploration of
architecture-conscious runtime implementations of machine learning algorithms
Multimedia search without visual analysis: the value of linguistic and contextual information
This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features
Better contextual suggestions in ClueWeb12 using domain knowledge inferred from the open web
Proceedings of the 23rd Text Retrieval Conference (TREC 2014), held in Gaithersburg, Maryland, USA, on 2014This paper provides an overview of our participation in the Contextual
Suggestion Track. The TREC 2014 Contextual Suggestion Track allowed participants
to submit personalized rankings using documents either from the OpenWeb
or from an archived, static Web collection, the ClueWeb12 dataset. In this paper,
we focus on filtering the entire ClueWeb12 collection to exploit domain knowledge
from touristic sites available in the Open Web. We show that the generated
recommendations to the provided user profiles and contexts improve significantly
using this inferred domain knowledge.This research was supported by the Netherlands Organization for Scientific Research
(NWO project #640.005.001
- âŠ